BMC Genomics — Latest Matching Preprints

1

High throughput single-cell RNA sequencing of intact adult cardiomyocytes and non-myocytes using a split-pool approach

Hu, Y.; Gurung, R.; Mueller, S.; Villanueva, E.; Stenzig, J.; Rayan, N.; Luu, T. D. A.; Nur, S.; Tan, B.; Liu, B.; Yu, H.; Choi, H.; Foo, R.; Ackers-Johnson, M. A.

2026-04-30 cell biology 10.64898/2026.04.28.721288 medRxiv

Top 0.1%

23.5%

Show abstract

MOTIVATIONAdult cardiomyocytes are difficult to profile by whole-cell single-cell RNA sequencing because of their large size and fragility, which make them poorly compatible with standard workflows. Current approaches for adult cardiomyocyte transcriptomics often require a trade-off between data quality and throughput, thus, studies instead rely heavily on sequencing of nuclei alone. Therefore, we set out to develop a high-quality and scalable workflow for adult heart cells using in-cell ligation and split-pool barcoding strategies to address this methodological gap. This workflow may be further generalisable to other large cell types or samples containing cell populations with highly unequal RNA content. SUMMARYAdult cardiomyocytes are difficult to profile by whole-cell single-cell RNA sequencing (scRNA-seq). Here, we developed a high-quality and scalable workflow for adult heart cells using in-cell ligation and split-pool barcoding. We identified per-cell RNA content as a significant variable that must be accounted for. Separation of cardiomyocytes (large cells) and non-cardiomyocytes (small cells) before library construction, and allocation of deeper sequencing to cardiomyocytes, produced high-quality whole-cell datasets for both compartments. Compared with single-nucleus RNA sequencing, whole-cell cardiomyocyte profiling better recovered metabolic, mitochondrial, cytoplasmic translational, and contractile gene programs. This workflow provides a practical method for scalable, high-quality cardiomyocyte whole-cell scRNA-seq and offers general strategies for other large cell types or samples containing cell populations with highly unequal RNA content.

2

Detection and evaluation of copy number variation using both linked-read and short-read sequencing in New Zealand dairy cattle

Wang, Y.; Nugroho, T.; Johnson, T. J. J.; Couldrey, C.; Harris, B. L.

2026-04-23 bioinformatics 10.64898/2026.04.20.718595 medRxiv

Top 0.1%

22.5%

Show abstract

In recent years, genetic studies have made significant progress in identifying single-nucleotide polymorphisms (SNPs) associated with cattle health and production traits. However, it is still challenging to identify and validate more complicated forms of variation, such as copy number variation (CNV) and other types of structural variation (SV). In this study, SV regions were identified using 37 New Zealand dairy cattle with linked-read sequence data. A transmission-based framework was used to validate these variants at the population scale. 62,438 putative autosomal SV regions were identified with the LongRanger pipeline following the 10x Genomics recommendations. Copy number states for these regions were subsequently estimated via a read-depth based genotyping method using CNVpytor in a population-representative cohort of 2306 animals using Illumina short-read sequencing technology. Mendelian inheritance of copy number states was assessed using linear mixed models incorporating pedigree information, and transmission levels were used to quantify the biological validity of each CNV region. Transmission levels ranged widely, with a mean of 0.5162 across all regions, where higher transmission levels were proportionally enriched for larger SVs. A total of 7218 CNV regions exhibited high transmission levels (>0.9), indicating strong evidence of inheritance. Among these, 7136 overlapped CNV regions reported in one or more public datasets, while 82 high-confidence regions represent previously unreported variants. High-transmission CNV regions tended to show clear, discrete inheritance patterns in trio families, providing the biological evidence that these CNVs are inherited within the population. Together, these results demonstrate that integrating linked-read sequencing with population-scale transmission-based validation provides a robust framework for identifying high-confidence CNV regions. This catalogue of validated CNV regions represents an important resource for downstream functional analyses and the incorporation of structural variation into genomic selection and breeding programs.

3

Assessment of Oxford Nanopore whole genome sequencing for large-scale genomic characterisation of Staphylococcus aureus

Haugan, I.; Flatby, H. M.; Lysvand, H.; Skei, N. V.; Zaragkoulias, K.; Solligard, E.; Ronning, T. G.; Olsen, L. C.; Damas, J. K.; Afset, J. E.; As, C. G.

2026-04-01 genomics 10.64898/2026.03.30.715209 medRxiv

Top 0.1%

20.1%

Show abstract

Whole-genome sequencing (WGS) is increasingly being utilised in microbial diagnostics, surveillance, and research. In this paper we assess the performance of one leading long-read sequencing technology, Oxford Nanopore Technology (ONT), on 836 Staphylococcus aureus bacteraemia isolates. We compare the results to that of a leading short-read sequencing technology, Illumina. All isolates were sequenced using ONT MinION Mk1B and Illumina HiSeq or MiSeq. Libraries were prepared according to manufacturers instructions. Preprocessing and downstream bioinformatic analyses were performed using a combination of in-house pipelines and publicly available software tools. The average base substitution error rate in ONT assemblies was low but varied between sequence types, possibly due to lineage-specific methylation patterns. Multi locus sequence typing was similar between the technologies, while ONT assemblies allowed for better spa typing than Illumina assemblies. The reported detection rate was similar between ONT and Illumina assemblies for most virulence- and AMR-associated genes and variants. For 42 (22.2%) of 189 genes/variants, the two technologies disagreed in gene detection in 5 isolates or more, and in 39 (20.6.%) of these the highest detection rate was found with ONT. Discrepancies were mainly associated with low GC content, multiple repetitive segments, and small plasmids. Polishing of ONT data resulted in minor changes in gene/variant calling. Our study supports the use of ONT WGS for bacterial population genomic studies on a large collection of S. aureus isolates. While assembly of ONT reads may be affected by its own methodological limitations, it was superior to Illumina assemblies in detection of potentially clinically relevant genes and variants at a low read error rate. Understanding the advantages and limitations of WGS technologies is essential before undertaking studies involving such methods on large sets of bacteria. Author summaryIn this paper, we present a practical assessment of one important whole genome sequencing (WGS) method, Oxford Nanopore Technology (ONT), and compare its performance in bacterial population genomics to that of WGS with Illumina technology. Our goal was to investigate the usefulness of ONT in studies aiming to identify clinically relevant bacterial characteristics in large collections of bacteria, such as genotype-phenotype studies. We sequenced a large set of clinical S. aureus isolates from episodes of bloodstream infections using both ONT and Illumina technologies and performed analyses with widely used software and bioinformatic pipelines. We have elucidated inherent strengths and limitations of ONT and Illumina sequencing and report some of the practical consequences of these on bacterial typing and detection of clinically relevant genes. With this study, we present one of the most comprehensive assessments of long-read sequencing technology for the genomic characterisation of clinical bacterial isolates, and the findings provide guidance for researchers considering WGS in large-scale bacterial genomics.

4

A tissue-resolved transcriptomic atlas of adult male Halyomorpha halys reveals tissue-specific RNAi machinery and a minimal systemic response to non-specific dsRNA

Amineni, V. P. S.; Ramapuram, S.; Panfilio, K. A.

2026-05-29 genomics 10.64898/2026.05.26.728018 medRxiv

Top 0.1%

19.7%

Show abstract

BackgroundHalyomorpha halys (brown marmorated stink bug) is an invasive polyphagous pest causing significant agricultural damage worldwide and is an emerging target for RNAi-based pest management. Despite growing interest in dsRNA-based biocontrol, progress is constrained by the lack of tissue-resolved transcriptomic resources covering key biological processes such as feeding, detoxification, and reproduction. Furthermore, our understanding of how RNAi machinery expression varies across tissues remains limited, which impairs both target gene selection and predictions of RNAi efficacy. Critically, the transcriptional response of H. halys to haemolymph-delivered non-specific dsRNA represents a key knowledge gap for evaluating potential non-target immune reactions of dsRNA-based approaches. ResultsField-collected adult males were injected with either nuclease-free water or dsRNA targeting GFP (dsGFP), and transcriptomes were generated from the brain, midgut, salivary glands, and testes. Sequencing produced high-quality datasets with clear tissue-level separation and tight clustering of biological replicates. As expected in targeting a non-endogenous gene, differential expression analysis revealed a limited transcriptional response to dsGFP. Baseline profiling of RNAi pathway genes in controls showed broad expression of core siRNA and miRNA components across all tissues, yet with marked specialisation: two additional Argonaute-2 isoforms and multiple piRNA factors were testes-specific, whereas salivary glands showed strong, restricted expression of nuclease-encoding genes, including a T2 ribonuclease and a non-specific endonuclease. Expression atlases also revealed pronounced tissue partitioning for other protein families. Consistent with their respective functions, secreted trypsins and chymotrypsins are salivary-enriched while the cathepsins for intracellular protein catabolism are midgut-enriched, with brain-centred neuropeptide expression. However, we also uncovered unexpected nuance, such as closely related subfamilies of Cytochrome P450s, which generally function as detoxification enzymes, being partitioned between the midgut, brain or testes. ConclusionsThis work delivers the first tissue-resolved transcriptomic atlas of adult male H. halys, providing a high-resolution resource on compartmentalization of proteolysis, detoxification, and neuroendocrine signalling, as well as for candidate gene discovery in RNAi-based pest control. The modest, tissue-restricted transcriptional response to non-specific dsRNA, together with strong tissue-specific enrichment of some components, offers mechanistic insight into tissue-dependent RNAi efficiency and supports rational dsRNA target selection in H. halys.

5

Sample barcoding-associated technical variation in probe-based single-cell RNA sequencing

Weir, J. A.; Krebs, Y.; Chen, F.

2026-04-08 genomics 10.64898/2026.04.06.716804 medRxiv

Top 0.1%

14.8%

Show abstract

Probe-based single cell RNA sequencing approaches are increasingly becoming a technology of choice for profiling gene expression at scale and in archival tissues. The 10x Genomics Flex v1 assay enables cost-effective and high-sensitivity single-cell RNA sequencing by splitting samples across up to 16 uniquely barcoded probe sets before pooling and loading onto a single lane of a microfluidic chip. A natural consequence of this design is to leverage probe set barcoding as a sample barcoding strategy for case-control experiments. However, we observed that Flex v1 probe set barcode identity drives substantial technical variation between probe set barcodes, an effect that is reproducible across lanes and independent datasets. When Flex v1 probe set barcodes are confounded with biological sample identity, a concerning number of differentially expressed genes at standard thresholds are false positives. The Flex v2 assay, which decouples sample barcoding from probe set hybridization, significantly reduces this artifact. As the field continues to expand adoption of probe-based assays, our findings introduce probe set barcoding as an underappreciated source of technical variation in single-cell assays and emphasize the importance of experimental design when using probe-based sequencing technologies.

6

A chromatin accessibility map of pea aphid brain and embryo identifies tissue-specific regulatory elements

Liu, X.; Brisson, J. A.

2026-05-15 genomics 10.64898/2026.05.14.725175 medRxiv

Top 0.1%

14.3%

Show abstract

The pea aphid (Acyrthosiphon pisum) is an important model organism for studying complex biological traits, including wing polyphenism and host-symbiont interactions, yet its regulatory genomic landscape remains largely uncharacterized. Here we present the first genome-wide chromatin accessibility map of the pea aphid, generated using the assay for transposase-accessible chromatin followed by sequencing (ATAC-seq). We profiled open chromatin regions (OCRs) in adult brains and late-stage embryos from winged and wingless morphs maintained under solitary or crowded conditions. We also paired ATAC-seq with RNA-seq in embryonic samples to examine the relationship between chromatin accessibility and gene expression. Libraries showed a high abundance of reads from the aphid endosymbionts Spiroplasma and Buchnera, reflecting preferential Tn5 transposase insertion into nucleosome-free bacterial DNA. After computational removal of these reads, the remaining aphid-mapping libraries displayed hallmarks of high-quality ATAC-seq data. We identified a consensus set of 37,127 OCRs enriched at promoters and distal regulatory elements, with substantial overlap with computationally predicted enhancers and enrichment for transcription factor binding motifs. Tissue identity was the dominant driver of chromatin variation, accounting for 85% of variance along the first principal component, with 19,513 differentially accessible regions distinguishing brain from embryo samples. By contrast, differences associated with wing morph or crowding treatment were modest. Promoter accessibility was significantly and positively correlated with gene expression genome-wide. Together, these data constitute a foundational regulatory genomics resource for the pea aphid and establish a framework for mechanistic studies of gene regulation in this ecologically and economically important insect.

7

Genome-wide Identification of Transcriptional Start Sites and Candidate Enhancers Regulating Worker Metamorphosis in Apis mellifera

Toga, K.; Yokoi, K.; Bono, H.

2026-03-16 genomics 10.64898/2026.03.12.711487 medRxiv

Top 0.1%

14.2%

Show abstract

Eusociality in bees represents a major evolutionary transition and understanding its molecular basis is fundamental for sociogenomic studies. Comparative genomics has revealed correlations between transcription factor binding site (TFBS) abundance and social complexity; however, when and where these TFBSs function in a eusocial context remains largely unclear. In this study, we performed cap analysis of gene expression (CAGE) during worker metamorphosis in the honeybee Apis mellifera to identify TFBSs within active enhancers and decipher the regulatory relationships between these enhancers and their target genes. We identified 17,349 transcription start sites (TSSs) and 842 candidate enhancers. Using CAGE, we identified five clusters based on expression patterns. Notably, genes associated with the canonical metamorphic regulators, Broad complex (Br-c) and E93, were found within specific clusters. By integrating the correlations between enhancer and TSS activities with motif enrichment analysis, we identified 15 transcription factor-enhancer-TSS regulatory relationships. Among these, tramtrack (ttk)-binding sites were identified in five enhancers associated with four target genes, including Br-c. The number of target genes regulated by ttk was the highest in our dataset. To examine whether this regulatory relationship is conserved across bee species with varying levels of sociality, we analyzed the sequence conservation of ttk-binding sites in Br-c enhancers and found that perfect sequence conservation of ttk-binding site was restricted to the Apis genus. The ttk-binding sites of other target genes exhibited the same Apis-specific conservation pattern. Our findings suggest that gene regulatory relationships during worker metamorphosis occur in a lineage-specific manner in the Apis genus. SignificanceHoneybees produce distinct castes--queens and workers--from genetically identical larvae via differences in gene regulation. Although enhancers have been computationally predicted, their actual activity during bee development has rarely been measured directly, and the CAGE technology has never been applied for this purpose. We identified active enhancers during worker metamorphosis and discovered that the transcription factor ttk may regulate Br-c, a key developmental gene. This study provides the first direct evidence of active enhancers and their regulatory roles in honeybee worker metamorphosis.

8

Genetic Diversity of Cytochrome P450 Genes in Apis mellifera Subspecies

Li, F.; Lima, D.; Bashir, S.; Yadro Garcia, C.; Lopes, A. R.; Verbinnen, G.; de Graaf, D. C.; De Smet, L.; Rodriguez, A.; Rosa-Fontana, A.; Rufino, J.; Martin-Hernandez, R.; Medibees Consortium, ; Pinto, M. A.; Henriques, D.

2026-03-24 genomics 10.64898/2026.03.20.713126 medRxiv

Top 0.1%

12.6%

Show abstract

The western honey bee (Apis mellifera) is an essential pollinator facing unprecedented threats from pesticide exposure. While pesticide resistance evolution is well documented in agricultural pests, our understanding of genetic variation in honey bee detoxification systems remains limited. This represents a missed opportunity, as harnessing naturally occurring detoxification diversity could provide new avenues for pollinator protection. Cytochrome P450 monooxygenases (CYPs), which are central to xenobiotic metabolism, offer a promising starting point. Here, we present the first comprehensive analysis of CYP genetic diversity in A. mellifera. We analysed the CYPome of 1,467 individuals representing 18 A. mellifera subspecies from 25 countries and identified 5,756 single-nucleotide polymorphisms (SNPs) in 46 CYP genes. Imputed McDonald-Kreitman testing revealed that 56% of non-synonymous CYP substitutions were driven by positive selection. Of the 1,302 haplotypes identified, 84% resided in CYP3, concentrated in the CYP9 and CYP6AS subfamilies implicated in xenobiotic detoxification. Population-level analysis of nucleotide diversity, Tajimas D selection signatures, FST-based differentiation, and McDonald-Kreitman testing pointed to CYP3 clan genes as the primary locus of adaptive variation. This work provides the first step toward building a comprehensive pharmacogenomic resource for honey bees, enabling the prediction of population-specific pesticide vulnerabilities and leveraging naturally occurring detoxification variants to enhance pollinator resilience - a critical step toward sustainable pollinator management.

9

The genetic architecture of milk urea concentration in dairy cattle differs across the lactation cycle

He, Q.; Vasiljevic, S.; Kadri, N.; Watson, N.; Stratz, P.; Mapel, X. m.; Leonard, A. S.; seefried, F. R.; Pausch, H.

2026-04-24 genomics 10.64898/2026.04.22.719978 medRxiv

Top 0.1%

12.2%

Show abstract

Milk urea concentration (MUC) is an indicator of dietary protein utilization and nitrogen use efficiency in dairy cows. We performed genome-wide association studies (GWAS) on MUC in early, mid, and late lactation in the Holstein (HOL) and Brown Swiss (BSW) dairy cattle breeds using imputed sequence variants. We identified 11 and 17 independent quantitative trait loci (QTL) for MUC across the three lactation stages in BSW and HOL, respectively. While many of these QTL have previously been reported for MUC and other dairy traits, our study provides evidence that some QTL exert lactation-stage specific effects. Our findings suggest that variants at the DGAT1 locus on BTA14 have pleiotropic effects on MUC and other dairy traits. This QTL showed an early lactation-specific association with MUC but impacted milk and fat yield across the entire lactation. We fine-mapped two QTL for MUC in early and mid-lactation in BSW on BTA9 (lead SNP: 9:21392941, Pcorrected = 1.1E-17) and BTA28 (lead SNP: 28:6518357; Pcorrected = 3E-11). We identified lncRNA ENSBTAG00000058688 and IBTK as positional and functional candidate genes for the BTA9 QTL, and KCNK1 as positional and functional candidate gene that harbors a highly significant missense variant for the BTA28 QTL. In conclusion, our results shed light on the genetic architecture of MUC and highlighted QTL harboring potential functional variants underpinning milk urea variation within and across breeds.

10

Selecting genomes that matter: haplotype-based prioritization for iterative pangenome expansion

Marone, M. P.; Chen, E.; Himmelbach, A.; Haberer, G.; Spannagl, M.; Stein, N.; Mascher, M.

2026-05-18 genomics 10.64898/2026.05.13.724976 medRxiv

Top 0.1%

12.1%

Show abstract

BackgroundAs pangenomes approach saturation, identifying additional genomes that contribute novel sequence information becomes increasingly difficult. Current sample-selection strategies often rely on global diversity metrics or variant counts and do not explicitly account for the composition of an existing pangenome, a limitation that becomes increasingly relevant as pangenomes mature. Here, we present SelHap, a haplotype-based pipeline that uses whole-genome sequencing (WGS) data to prioritize accessions based on their contribution of novel haplotypes relative to a defined background, enabling targeted and iterative pangenome expansion. ResultsWe applied SelHap to the barley pangenome, using 76 assembled genomes as a background to select new accessions from a large WGS panel. Using this approach, we generated chromosome-scale genome assemblies from 19 accessions selected with SelHap and from 17 elite lines selected based on their relevance in historical barley breeding. Across multiple benchmarking scenarios, SelHap-based selection consistently resulted in a greater increase in non-redundant (single-copy) pangenome sequence, demonstrating that prioritizing haplotype novelty relative to an existing background maximizes unrepresented sequence content. ConclusionsBy transforming complex haplotype-clustering outputs into interpretable summaries and ranked candidate lists, SelHap provides a practical framework for targeted pangenome expansion. Beyond sample selection, SelHap can facilitate ancestry and germplasm comparisons across diverse panels. As WGS data become more accessible, SelHap offers a scalable and interpretable solution for extending mature pangenomes by explicitly targeting previously unrepresented sequence space.

11

A Putative Single-Locus Determinant of the Suppressed In Ovo Virus Infection (SOV) Trait in Apis mellifera

Lefebre, R.; Broeckx, B. J. G.; De Smet, L.; Braeckman, M.; Gregorc, A.; Peelman, L.; de Graaf, D. C.

2026-05-29 genomics 10.64898/2026.05.28.728461 medRxiv

Top 0.1%

10.3%

Show abstract

Today, the deformed wing virus (DWV) can be considered as one of the major causes of global elevated western honey bee colony losses (Apis mellifera). Virus transmission may occur horizontally between individuals of the same generation, but also vertically from parents to offspring. The recently defined heritable suppressed in ovo virus infection (SOV) trait describes the absence of viruses in pooled drone eggs of a queen, associated with significant lower DWV prevalence and viral loads in the subsequent developmental offspring stages. By definition, the trait reflects the absence of vertical virus transmission from SOV-positive (SOV+) queens themselves to their offspring. However, the genetic basis influencing this heritable virus resilience has not been identified yet. In this study, we aimed to identify SOV-associated genetic marker(s) or loci in the honey bee genome through genome-wide variant comparison of 44 DWV-positive and 44 DWV-negative drone pupae descendent from an artificially created hybrid SOV+/SOV- colony. After whole genome sequencing (WGS), variant calling, and genotype-phenotype association analysis by means of single marker tests and elastic net regression, one variant in a locus of 241.246 bp on chromosome 7 that contained 17 other highly SOV-associated variants classified 68,2% of the drone phenotypes correctly. These results may support the potential application of marker-assisted selection (MAS) strategies targeting reduced vertical virus transmission in honey bees.

12

Evaluation of a multiplexed tiling PCR scheme for whole-genome amplification of hepatitis B virus using Oxford Nanopore sequencing

Brate, J.; Grande, E. G.; Pedersen, B. N.; Frengen, T. G.; Stene-Johansen, K.

2026-03-31 molecular biology 10.64898/2026.03.28.714721 medRxiv

Top 0.1%

10.1%

Show abstract

Here we evaluated the performance of a previously published tiling PCR primer scheme by Ringlander et al. (2022) for whole-genome amplification of Hepatitis B virus (HBV) in combination with Oxford Nanopore sequencing. The primer set originally developed for Ion Torrent sequencing was adapted by removing platform-specific adapters and tested using clinical serum or plasma samples submitted for routine HBV genotyping and resistance testing. Two multiplexing strategies were compared: a single PCR pool containing all primers and a two-pool strategy with non-overlapping amplicons. Sequencing reads were processed using a Nanopore analysis pipeline, and genome coverage and amplicon performance were compared across samples spanning a wide Ct range and representing HBV genotypes A-E. Across all samples, the median genome coverage was approximately 50%, although recovery varied widely, ranging from complete failure to nearly full genomes. Combining all primers into a single PCR reaction, or separating overlapping amplicons into different reactions, had little overall impact on genome recovery, and no consistent differences between the two pooling strategies were observed. In contrast, amplification efficiency differed markedly between individual amplicons. Amplicons 1-5 generally produced higher sequencing depth, whereas amplicons 6-10 frequently showed low coverage and contributed to incomplete genome recovery. Genome coverage was strongly associated with Ct values, with higher coverage observed in samples with lower Ct values, while coverage was broadly similar across genotypes. These results demonstrate that the Ringlander et al. primer scheme can be adapted for multiplex PCR and Nanopore sequencing of HBV, but uneven amplicon performance limits consistent full-genome recovery and highlights the need for further optimization of HBV tiling PCR designs.

13

Transposable element disruption of a second thyroglobulin-like gene confers Vip3Aa resistance in Helicoverpa armigera

Bachler, A.; Walsh, T. K.; Andrews, D.; Williams, M.; Tay, W. T.; Gordon, K. H.; James, B.; Fang, C.; Wang, L.; Wu, Y.; Stone, E. A.; Padovan, A.

2026-04-09 genomics 10.64898/2026.04.06.716841 medRxiv

Top 0.1%

10.0%

Show abstract

BackgroundThe cotton bollworm Helicoverpa armigera is a major global pest controlled by genetically engineered crops expressing Bacillus thuringiensis (Bt) toxins, including Vip3Aa. While Vip3Aa is widely deployed, the genetic basis of resistance remains poorly understood. Previous work identified disruption of a thyroglobulin-like gene (HaVipR1) as one mechanism of resistance, suggesting additional loci may be involved. ResultsUsing linkage analysis, transcriptomics, long-read sequencing, and CRISPR-Cas9 gene editing, we identify a second thyroglobulin-like gene, HaVipR2, as a novel mediator of Vip3Aa resistance. Resistance in a field-derived H. armigera line was shown to be monogenic, recessive, and autosomal, mapping to chromosome 29. Long-read sequencing revealed a [~]16 kb transposable element insertion disrupting HaVipR2, which was undetectable using standard short-read approaches. CRISPR-Cas9 knockout of HaVipR2 conferred >900-fold resistance, confirming its causal role. Comparative analyses show that HaVipR1 and HaVipR2 share conserved domain architecture, indicating that thyroglobulin-domain proteins represent a recurrent target of resistance evolution. ConclusionsOur findings establish thyroglobulin-domain proteins as a new class of Bt resistance genes in Lepidoptera and demonstrate that transposable element insertions can drive adaptive resistance while evading detection by conventional methods. These results highlight the importance of long-read sequencing and accurate genome annotation for resistance monitoring and provide new insights into the molecular basis and evolution of Vip3Aa resistance.

14

A framework for identifying transcript orthologs: the evolution of sex bias in alternative transcript structure in Drosophila

.Bankole, K.; McIntyre, L.; Garan, M.; Morse, A. M.; Keil, N.; Hernandez, A.; Barmina, O.; Khan, M.; Kopp, A.; Rogers, R.; Graze, R. M.

2026-05-26 genomics 10.64898/2026.05.25.727716 medRxiv

Top 0.1%

9.9%

Show abstract

BackgroundRecent advances in long read technologies provide an unprecedented opportunity to study transcript evolution. However, comparative evolutionary studies, even in Drosophila, are limited by inconsistent and incomplete annotation, and the lack of annotated transcript homology. ResultsIn this study of five species spanning 28 million years (D. melanogaster, D. simulans, D. yakuba, D. santomea and D. serrata), we infer transcript homology using reciprocal liftover, and orthology using network analyses, with data validation from long read RNA-seq of male and female head tissue. We build the first genus level annotation, with 15,996 genes and 56,370 transcripts. Expressed transcripts are conserved, 73% of transcript orthologs are detected in all species. Even the improved annotation underestimates the number of genes with alternative transcripts, with 75% of genes expressing multiple structurally diverse transcripts. In a replicated quantitative evaluation of [~]10,000 genes, both male and female-biased transcripts are expressed in 410 (D. melanogaster), 608 (D. simulans), and 493 (D. serrata) genes and in 118 orthologous genes in the D. melanogaster - D. simulans species pair, indicating greater potential for resolution of sexual conflict by alternative transcription than previously appreciated. We identified 605 transcript orthologs conserved for sex bias in the D. melanogaster-D. simulans species pair and of these, 22 male and 19 female-biased transcripts were conserved in sex bias with the outgroup D. serrata, including transcripts of genes involved in brain development, Sxl target Glutamine synthetase 2 and ciboulot. ConclusionsConserved alternative transcripts suggest that transcriptional diversity is a pervasive driver of the evolution of functional diversity.

15

Directional Gene-Level Concordance and Methodological Constraints in Blood Transcriptomic and DNA Methylation Studies of Parkinson's Disease

Kaur, R.; Dewan, C.; Chauhan, I.; Sharma, K.; Sharma, S.

2026-05-20 neuroscience 10.64898/2026.05.17.725808 medRxiv

Top 0.1%

8.6%

Show abstract

Assessing reproducibility across different molecular profiling studies is a persistent methodological challenge (Zhang et al., 2009; Sweeney et al., 2017; Ioannidis, 2005). Differences in platform technology, cohort composition, analytical pipelines, and feature definitions often make it difficult to interpret cross-study comparisons based solely on gene-identity overlap. In this study, we conducted a retrospective computational analysis of seven publicly available analytical datasets (including alternative analytical pipelines applied to the same cohort) derived from five biologically independent peripheral blood transcriptomic and DNA methylation cohorts, comprising 3,487 samples (1,824 Parkinsons disease cases and 1,663 controls). Reproducibility was evaluated using gene-identity overlap, enrichment-based comparisons, and a permutation-based framework to assess directional consistency of effect estimates across datasets. We also tested the robustness of results by varying false discovery rate thresholds and applying alternative probe-to-gene collapsing strategies. All analyses were performed using reproducible workflows implemented in R and Python with fixed random seeds. Across independent cohorts, gene-identity overlap was generally limited, with enrichment ratios close to one, especially when datasets were generated using different platforms. In several datasets, limited numbers of statistically significant features further constrained overlap-based comparisons. In contrast, directional consistency showed greater stability. High levels of directional consistency were observed across independent cohort comparisons when restricted to overlapping statistically significant features and remained stable across statistical thresholds (90.0% at FDR < 0.05 and 82.8% at FDR < 0.10). When evaluated across the full shared gene universe without conditioning on statistical significance, directional consistency was substantially lower ([~]30 to 32%) but remained significantly above permutation-based null expectations. Permutation testing confirmed that the observed directional consistency exceeded what would be expected by chance. A combined analysis including methodological replicates (n [≥] 3 datasets) showed 98.3% directional consistency; however, this estimate includes non-independent analytical pipelines applied to the same cohort and reflects analytical stability rather than independent biological replication. Rather than introducing a new statistical method, this study examines how commonly used reproducibility metrics behave under crossstudy heterogeneity and identifies their practical limitations and appropriate use boundaries.

16

QTL spanning the TGF-β2 locus is associated with muscle fiber hypertrophy in rainbow trout

Raghu, A.; Raymo, G.; Ahmed, R.; Ali, A. R.; Leeds, T.; Salem, M.

2026-05-27 genomics 10.64898/2026.05.24.727516 medRxiv

Top 0.1%

8.5%

Show abstract

BackgroundSkeletal muscle growth is a key determinant of body size and market value in salmonid aquaculture, yet the mechanisms linking genomic variation to muscle fiber hypertrophy remain poorly resolved. Myofiber cross-sectional area (CSA) provides a quantitative cellular proxy for fiber size and a direct link to macroscopic growth traits. MethodsWe performed histological phenotyping of white skeletal muscle from rainbow trout (Oncorhynchus mykiss) representing divergent fillet-yield selection lines (ARS-FY-H and ARS-FY-L), quantifying mean myofiber CSA and fiber number using high-throughput image analysis. Genome-wide association analysis (GWAS) was conducted using low-pass whole-genome sequencing ([~]1x) with genotype imputation and functional variant annotation. RNA sequencing was performed on fish representing high and low CSA extremes to identify differentially expressed genes and enriched biological pathways. ResultsMean myofiber CSA was significantly associated with body weight, muscle weight, visceral weight, and body length (p < 0.05), while fiber count showed no significant association with most growth traits, implicating hypertrophy as the primary driver of muscle mass variation. GWAS identified a significant QTL spanning [~]4.76 Mb on chromosome 2 (117 significant SNPs; Bonferroni-adjusted P [≤] 0.05; {lambda} = 1.02). Associated variants were predominantly noncoding, enriched in intronic, intergenic, and enhancer-annotated regions. A high density of SNPs colocalized with the TGF-{beta}2 locus, overlapping strong and genic enhancer elements in white muscle. Transcriptomic comparisons revealed that high-CSA muscle showed elevated expression of genes related to contractile function, cytoskeletal organization, and translation, while low-CSA muscle exhibited upregulation of extracellular matrix and immune-related genes consistent with a tissue remodeling state. ConclusionsNoncoding regulatory variation within a significant QTL spanning the TGF-{beta}2 locus is associated with distinct transcriptional programs linked to muscle fiber hypertrophy in rainbow trout. By integrating genetic variation, chromatin-state annotation, and transcriptomic profiling, this study identifies candidate regulatory loci associated with variation in muscle cellularity and growth-related phenotypes in rainbow trout.

17

Transposable elements as new players to decipher sex differences in Parkinson Disease

Gordillo-Gonzalez, F.; Galiana-Rosello, C.; Grillo-Risco, R.; Soler-Saez, I.; Hidalgo, M. R.; Siomi, H.; Kobayashi-Ishihara, M.; Garcia-Garcia, F.

2026-03-30 bioinformatics 10.64898/2026.03.27.714370 medRxiv

Top 0.1%

8.5%

Show abstract

We present a novel integrative analysis of transposable elements (TEs) in 4 single cell RNA-seq (scRNA-seq) datasets of postmortem substantia nigra pars compacta samples of Parkinson Disease (PD) patients matched healthy controls, with the objective of building a cell-type specific trustworthy atlas of TEs that may clarify the role of TEs in sex differences in PD. We have used the soloTE tool to evaluate the TEs expression changes across all snRNA-seq studies identified in our previous systematic review, and then integrated the results using meta-analysis techniques. Finally, we evaluated the possible associations between TEs and protein coding genes by integrating our previous results in this matter with the information of TEs obtained, in order to propose the possible action mechanism by which some of the TEs contribute to PD.

18

Benchmarking long-read RNA-seq across modalities, methods, and sequencing depth in iNeurons

Schubert, R.

2026-04-04 bioinformatics 10.64898/2026.04.01.715783 medRxiv

Top 0.1%

8.5%

Show abstract

Long-read RNA sequencing (lrRNA-seq) provides advantages for transcript discovery and quantification through the sequencing of full-length transcripts. Although recent benchmarks have evaluated long-read technologies and quantification tools, to the best of our knowledge, no study to date has jointly compared sequencing technology, quantification choice, and depth across both bulk and single-cell platforms. Here, we generate a matched dataset using NGN2-induced neurons derived from Fragile X syndrome and isogenic rescue lines, profiled with bulk and single-cell Illumina, Oxford Nanopore Technologies (ONT), and Pacific Biosciences (PB) Kinnex technologies. All platforms and technologies capture the expected FMR1 reactivation signal. We find that PB bulk under-detects and under-quantifies short transcripts (less than 1.25 kb), ONT bulk under-detects and under-quantifies long transcripts (greater than 5 kb), and single-cell long-read technologies a large number of single-cell specific transcripts associated with truncations. Across six bulk and four single-cell long-read quantification tools, Isosceles, Miniquant, and Oarfish provide the best compromise between computational efficiency and performance in terms of quantification accuracy as measured by spike-ins, comparisons to Illumina, and on consensus based down-stream tasks such as differential transcript expression (DTE). Depth-equivalency analyses reveal that PB single-cell sequencing requires approximately three- to four-fold greater depth than bulk to reach comparable power for transcript discovery and differential transcript expression. Our work aims to offer practical guidance for study design, including the choice of technology, sequencing depth, and quantification method. In addition, we hope our data may serve a reference dataset to evaluate emerging long-read transcriptomic protocols and methods as well as more closely investigate FMR1 biology.

19

Genomic Variability of the HCT116 Cell Line Identified Using Oxford Nanopore Sequencing

Leonov, P.; Mikheeva, R.; Koryukov, M.; Ruleva, E.; Karabut, E.; Kechin, A.

2026-04-24 genomics 10.64898/2026.04.23.720331 medRxiv

Top 0.1%

8.5%

Show abstract

HCT116 is a colorectal cancer cell line frequently used in anti-tumor drug development experiments as well as in studies of the molecular machinery of eukaryotic cells. It is well characterized by the presence of several single-nucleotide and short mutations in multiple oncogenes and tumor suppressor genes, including KRAS, PIK3CA, MLH1, CTNNB1, CDKN2A, TGFBR2, and BRCA2. However, its landscape of large genomic rearrangements (LGRs) and copy number variants (CNVs) is still far from being fully understood. Therefore, the aim of this study was to identify LGRs and CNVs in several HCT116 cell line samples using Oxford Nanopore sequencing technology, including three samples from the SRA NCBI database, and to compare common and unique variants across all samples. Using the recently developed eLaRodON tool, we identified 22,666 common LGRs, among which more than 70% of tandem duplications and deletions larger than 80 kb were confirmed by CNV analysis. Among LGRs affecting protein-coding sequences, two in-frame rearrangements were identified: a deletion of exons 4-6 and a duplication of exon 10 in the CCSER1 gene, which encodes a cell division regulator protein. Given its high rearrangement rate in various tumors and the clinical significance of its overexpression, this finding may be potentially useful in future research on this cell line. Regarding differences between samples, we found that LGRs in the laboratory sample and in one of the three SRA NCBI samples occurred more frequently via ALR/Alpha repeats than via Alu repeats, in contrast to common LGRs and those unique to the other samples, a finding that may indicate the presence of unique mechanisms of genomic instability. Thus, this study reveals a broad spectrum of large genomic rearrangements and copy number variants that can be identified in the HCT116 cell line using Oxford Nanopore sequencing, including rearrangements specific to distinct cell line samples.

20

Barcode Crosstalk in ONT Multiplex Sequencing: Quantification and Mitigation Strategies

Scharf, S. A.; Spohr, P.; Ried, M. J.; Haas, R.; Klau, G. W.; Henrich, B.; Pfeffer, K.

2026-03-28 molecular biology 10.64898/2026.03.27.714689 medRxiv

Top 0.1%

8.5%

Show abstract

Multiplexing samples in long-read sequencing with Oxford Nanopore Next Generation Sequencing Technology (ONT) by ligating specific native barcodes to individual DNA samples enables significant increases of high throughput sequencing combined with a significant reduction of sequencing costs. However, this advantage carries the risk of barcode misassignment / crosstalk. Employing ONT multiplex sequencing with samples, we observed misassigned barcodes so called barcode crosstalk, after ONT library preparation according to the standard protocol, particularly in samples with low input DNA concentrations. We assumed that these barcode misassignments are largely due to misligation of remaining native barcodes during subsequent the subsequent sequencing adapter ligation. To systematically investigate and quantify barcode crosstalk, genomic DNA (gDNA) from four bacterial type strains with different DNA input concentrations was prepared using three protocols for library preparation: the Nanopore standard protocol (protocol A: version valid until July 2, 2025) the new Nanopore protocol (protocol B: version from July 2, 2025), and an in house protocol with pooling of the barcoded samples only after the sequencing adapter ligation step (protocol C: in house). All samples were sequenced on a Nanopore PromethIon device. The results clearly showed that the use of protocol A resulted in a pronounced barcode crosstalk especially detectable in samples with low DNA input concentrations (up to 2.4% misassigned reads). The ONT adjustment in protocol B (altered washing buffer vs. protocol A) significantly alleviated the barcode crosstalk to below 0.01%, whereas protocol C eliminated barcode crosstalk virtually completely. These observations emphasize that sequencing results obtained with older ONT native barcoding protocol variants should be critically reviewed. The newer ONT barcoding protocol is preferable for sequencing, but it does not completely eliminate the barcode crosstalk effect. In conclusion, for low DNA input and high accuracy sequencing, protocol C is recommended.